std::codecvt_utf8
From Cppreference
| C++ Standard Library | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Localizations library | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Defined in header <codecvt>
| ||
| template<
class Elem, | ||
| This section is incomplete Reason: inheritance diagram |
std::codecvt_utf8 is a std::codecvt facet which encapsulates conversion between a UTF-8 encoded byte string and UCS2 or UCS4 character string (depending on the type of Elem). This codecvt facet can be used to read and write UTF-8 files, both text and binary.
[edit] Template Parameters
| Elem | - | either char16_t, char32_t, or wchar_t |
| Maxcode | - | the largest value of Elem that this facet will read or write without error |
| Mode | - | a constant of type std::codecvt_mode |
[edit] Example
The following example demonstrates the difference between UCS2/UTF-8 and UTF-16/UTF-8 conversions: the third character in the string is not a valid UCS2 character.
#include <iostream> #include <string> #include <locale> #include <codecvt> int main() { // UTF-8 data. The character U+1d10b, musical sign segno, does not fit in UCS2 std::string utf8 = u8"z\u6c34\U0001d10b"; // the UTF-8 / UTF-16 standard conversion facet std::wstring_convert<std::codecvt_utf8_utf16<char16_t>, char16_t> utf16conv; std::u16string utf16 = utf16conv.from_bytes(utf8); std::cout << "UTF16 conversion produced " << utf16.size() << " code points:\n"; for(char16_t c : utf16) std::cout << std::hex << std::showbase << c << '\n'; // the UTF-8 / UCS2 standard conversion facet std::wstring_convert<std::codecvt_utf8<char16_t>, char16_t> ucs2conv; try { std::u16string ucs2 = ucs2conv.from_bytes(utf8); } catch(const std::range_error& e) { std::u16string ucs2 = ucs2conv.from_bytes(utf8.substr(0, ucs2conv.converted())); std::cout << "UCS2 failed after producing " << std::dec << ucs2.size()<<" characters:\n"; for(char16_t c : ucs2) std::cout << std::hex << std::showbase << c << '\n'; } }
Output:
UTF16 conversion produced 4 code points: 0x7a 0x6c34 0xd834 0xdd0b UCS2 failed after producing 2 characters: 0x7a 0x6c34
[edit] See also
| Character conversions | narrow multibyte (char) | UTF-8 (char) | UTF-16 (char16_t) |
|---|---|---|---|
| UTF-16 | mbrtoc16 / c16rtombr | codecvt<char16_t, char, mbstate_t> codecvt_utf8_utf16<char16_t> codecvt_utf8_utf16<char32_t> codecvt_utf8_utf16<wchar_t> | N/A |
| UCS2 | No | codecvt_utf8<char16_t> | codecvt_utf16<char16_t> |
| UTF-32/UCS4 (char32_t) | mbrtoc32 / c32rtombr | codecvt<char32_t, char, mbstate_t> codecvt_utf8<char32_t> | codecvt_utf16<char32_t> |
| UCS2/UCS4 (wchar_t) | No | codecvt_utf8<wchar_t> | codecvt_utf16<wchar_t> |
| wide (wchar_t) | codecvt<wchar_t, char, mbstate_t> mbstowcs / wcstombs | No | No |
| converts between character encodings, including UTF-8, UTF-16, UTF-32 (class template) | ||
| tags to alter behavior of the standard codecvt facets (class) | ||
| converts between UTF-16 and UCS2/UCS4 (class template) | ||
| converts between UTF-8 and UTF-16 (class template) | ||